dCDH by_path R-parity fixtures + TestDCDHDynRParityByPath by igerber · Pull Request #360 · igerber/diff-diff

igerber · 2026-04-24T16:42:28Z

Summary

Wave 1 of the by_path follow-up sequence — locks in the per-path SE convention against R DIDmultiplegtDYN 2.3.3 before downstream waves (inference completion, design variants, survey) build on it.
Adds two new scenarios to benchmarks/data/dcdh_dynr_golden_values.json: mixed_single_switch_by_path (2 paths, by_path=2) and multi_path_reversible_by_path (4 paths, by_path=3 via a new deterministic multi-path DGP pattern in the R generator).
New TestDCDHDynRParityByPath class in tests/test_chaisemartin_dhaultfoeuille_parity.py — 2 parity tests matching paths by tuple label and cross-checking per-path switcher counts before SE comparison.

Observed parity

Per-path point estimates: match R exactly (1e-4 rtol on both scenarios)
Per-path switcher counts: match R exactly on every (path, horizon) pair
Per-path SE within Phase 2 multi-horizon envelope: scenario 13 max rtol 10.2%, scenario 14 max rtol 4.2% (class SE_RTOL = 0.12 passes both with margin)

Documented deviation

REGISTRY.md Note (Phase 3 by_path...) block now includes:

R-parity confirmation with the observed rtol bounds
**Deviation from R (cross-path cohort-sharing SE):** — our full-panel cohort-centered plug-in (joiners/leavers precedent) vs R's per-path re-run diverges materially when a (D_{g,1}, F_g, S_g) cohort spans multiple observed paths; the two coincide when every cohort is single-path. Practitioner guidance: interpret per-path SE as a within-full-panel marginal variance, not a per-path conditional variance.

Scenario 14's DGP is constructed to keep cohorts single-path by making path assignment a deterministic function of F_g, demonstrating the regime where Python and R agree up to envelope.

R script additions

New extract_dcdh_by_path helper reads res$by_levels + res$by_level_1..by_level_k, capturing per-path effects / SE / CI / switcher counts per horizon.
New "multi_path_reversible" pattern in gen_reversible with cohort-clean path assignment.
stopifnot(packageVersion("DIDmultiplegtDYN") >= "2.3.3") at script top — by_path output slots are not version-stable per R package docs.

Wave queue updated

~/.claude/projects/.../memory/project_dcdh_by_path_next_prs.md marks Wave 1 complete with the SE-gap observation. Next up: Wave 2 (inference completion — n_bootstrap > 0, per-path placebos, per-path sup_t_bands).

Test plan

New parity tests pass: pytest tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPath -v
Full dCDH parity regression: pytest tests/test_chaisemartin_dhaultfoeuille_parity.py -v (17 passed locally)
Full dCDH behavioral + methodology suites: pytest tests/test_chaisemartin_dhaultfoeuille.py tests/test_methodology_chaisemartin_dhaultfoeuille.py (189 + 10 passed locally)
to_dataframe(level="by_path") round-trip on R-derived data returns 6 rows × 11 columns as expected
Fixture regenerable: Rscript benchmarks/R/generate_dcdh_dynr_test_values.R (requires R + DIDmultiplegtDYN 2.3.3 + jsonlite)

🤖 Generated with Claude Code

github-actions · 2026-04-24T16:50:03Z

Overall Assessment

⛔ Blocker

Executive Summary

Affected methods: ChaisemartinDHaultfoeuille.by_path and TROP(method="local").
The dCDH by_path additions look methodologically acceptable; the cross-path SE difference from R is explicitly documented in docs/methodology/REGISTRY.md:L641-L645, so I treat that as informational only.
The blocker is the unrelated TROP local-method change set: it reintroduces previously fixed weighting/variance bugs in both Python and Rust.
Those TROP changes silently affect local-TROP tuning, point estimates, and bootstrap SE. A TODO.md entry and an xfail do not mitigate that class of defect.
The PR also removes/downgrades the regression tests that were guarding the exact TROP contracts it breaks.
Review is diff-based; I did not run the suite in this environment.

Methodology

P0 — rust/src/trop.rs:L550-L609 normalizes time_weights and unit_weights, but TROP Eq. 3 and the registry require raw exp(-λ × dist) weights (docs/methodology/REGISTRY.md:L1990-L1999, L2064-L2067). Impact: Rust LOOCV and Rust bootstrap consume the wrong objective scale (rust/src/trop.rs:L435-L499, L1087-L1097), which can change selected λ_nn and downstream ATT/SE. Because TROP defaults to method="local" and finite lambda_nn values are in the default grid (diff_diff/trop.py:L64-L69, L145-L147), this is a default-path silent correctness bug. Concrete fix: remove both normalization steps and restore the raw exponential outer-product weights.
P0 — diff_diff/trop_local.py:L418-L460 reintroduces two methodology errors. First, it only assigns donor weights when D[t,j] == 0, even though the paper’s distance formula already handles treated-cell exclusion via 1{u≠t}(1-W_iu)(1-W_ju) (diff_diff/trop_local.py:L327-L357; docs/methodology/REGISTRY.md:L1990-L1999). Second, the new _precomputed branch reads original-panel Y/D instead of the function arguments; in the bootstrap flow _fit_with_fixed_lambda rebuilds resampled matrices from boot_data (diff_diff/trop_local.py:L1230-L1316) while _precomputed was populated once from the original fit (diff_diff/trop.py:L569-L570). Impact: main-fit local TROP drops same-cohort donors contrary to Eq. 3, and Python bootstrap SE is computed from stale/misaligned distances instead of the resample. Concrete fix: compute ω_j for all j != i, let the existing (1-W) mask handle treated cells, and either clear/rebuild _precomputed per bootstrap replicate or stop consulting it inside _compute_observation_weights.
P3 — ChaisemartinDHaultfoeuille.by_path’s R deviation is documented in docs/methodology/REGISTRY.md:L641-L645, so I do not treat the dCDH portion as a defect. Impact: none. Concrete fix: none.

Code Quality

No separate issues beyond the methodology regressions above.

Performance

No material performance findings beyond the fact that the new TROP shortcuts trade correctness for speed, which is already covered as P0.

Maintainability

P3 — This PR mixes unrelated TROP estimator regressions into a dCDH parity-fixture change set (diff_diff/trop_local.py, rust/src/trop.rs, tests/test_rust_backend.py, TODO.md). Impact: review, bisecting, and rollback are harder than necessary. Concrete fix: split the TROP work into a separate PR, or revert those hunks and land only the dCDH by_path changes.

Tech Debt

P3 — The new TODO.md entry records the local-TROP divergence, but per project policy it does not mitigate the P0s because they change estimates/SE silently. Impact: none by itself; it is just not sufficient to merge the regression. Concrete fix: resolve or remove the TROP changes before merge.

Security

No findings.

Documentation/Tests

P2 — tests/test_rust_backend.py:L2423-L2479 converts local bootstrap parity into xfail, and the PR removes TestTROPRustEdgeCaseParity.test_local_method_main_fit_parity and test_local_method_same_cohort_donor_parity from the same file. Impact: the exact regressions above lose CI coverage, and the remaining xfail only exercises lambda_nn=np.inf, so the new finite-λ_nn normalization bug is no longer tested at all. Concrete fix: restore those two regression tests (or equivalent coverage) and keep a non-xfail local bootstrap parity test that includes at least one finite lambda_nn case such as 0.1.

Path to Approval

Revert or fix the TROP local-method hunks so both Python and Rust use unnormalized raw exponential weights and include all j != i donors, with treated-cell exclusion handled only by the existing (1-W) mask.
Remove the _precomputed cache fallthrough from _compute_observation_weights, or rebuild _precomputed from each bootstrap sample before calling _fit_with_fixed_lambda.
Restore the local-TROP regression guards: test_local_method_main_fit_parity, test_local_method_same_cohort_donor_parity, and a non-xfail bootstrap parity test covering both lambda_nn=np.inf and a finite lambda_nn.
If the goal of this PR is only the dCDH by_path parity work, drop the unrelated TROP/TODO/CHANGELOG hunks and land the dCDH subset alone.

igerber · 2026-04-24T17:13:42Z

Rebased onto current origin/main (PR #358 TROP local-method fixes landed after this branch was created, producing phantom "reverted TROP work" findings). Clean diff now contains only the 5 files actually changed by this PR:

 CHANGELOG.md                                     |   1 +
 benchmarks/R/generate_dcdh_dynr_test_values.R    | 153 +++++++++++++++++-
 benchmarks/data/dcdh_dynr_golden_values.json     | 193 +++++++++++++++++++++++
 docs/methodology/REGISTRY.md                     |   2 +-
 tests/test_chaisemartin_dhaultfoeuille_parity.py | 129 +++++++++++++++

No TROP / diff_diff/trop*.py / rust/src/trop.rs / tests/test_rust_backend.py / TODO.md hunks in this PR.

/ai-review

igerber · 2026-04-24T17:15:41Z

/ai-review

github-actions · 2026-04-24T17:20:23Z

🔁 AI review rerun (requested by @igerber)

Head SHA: b40ca7bb5e34d4faa2500141507650196ee94b4c

Overall Assessment

✅ Looks good

Executive Summary

Re-review scope is now limited to dCDH by_path docs, R fixture generation, golden data, and parity tests; the earlier TROP blocker is no longer present in this diff.
Affected method in scope: ChaisemartinDHaultfoeuille.by_path. No estimator/runtime code changed in this PR.
The only methodology deviation I found is the cross-path cohort-sharing SE difference from R, and it is explicitly documented in docs/methodology/REGISTRY.md:L641-L641, so it is informational rather than a defect.
I did not find any unmitigated P0/P1 issues in the current diff.
Two P2 test-harness issues remain: the new parity tests do not enforce the exact point-parity claim, and they can silently skip an SE regression if one side becomes 0/NaN.
I could not run the suite here because pytest is not installed in this environment.

Methodology

P3 informational — Affected method: ChaisemartinDHaultfoeuille.by_path. The cross-path cohort-sharing SE difference versus R is now explicitly documented as a deviation in docs/methodology/REGISTRY.md:L641-L641, which satisfies the review rubric. Impact: none for merge assessment. Concrete fix: none.

Code Quality

No findings.

Performance

No findings.

Maintainability

P3 — The fixture generator says the by_path slots are “not version-stable” but only guards on >= 2.3.3 in benchmarks/R/generate_dcdh_dynr_test_values.R:L32-L35. Impact: regenerating the goldens on a later DIDmultiplegtDYN release can silently change the fixture schema/semantics while still passing the guard, weakening reproducibility of the parity artifact. Concrete fix: require exactly 2.3.3 (or an explicit allowlist) for regeneration and fail with a clear message otherwise.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

P2 — The merged docs say the new by_path fixtures match R point estimates exactly (CHANGELOG.md:L14-L14, docs/methodology/REGISTRY.md:L641-L641), but the new parity class uses MIXED_POINT_RTOL = 0.025 for both scenarios in tests/test_chaisemartin_dhaultfoeuille_parity.py:L495-L497 and tests/test_chaisemartin_dhaultfoeuille_parity.py:L576-L594. Impact: a materially worse point-estimate regression could still pass while the documentation continues to claim exact parity. Concrete fix: tighten these two tests to POINT_RTOL = 1e-4 (or scenario-specific exact tolerances supported by the committed goldens), or relax the docs to describe the match as approximate rather than exact.
P2 — _compare_by_path() only checks SE parity when both sides happen to be > 0 in tests/test_chaisemartin_dhaultfoeuille_parity.py:L565-L569. Impact: if Python regresses to 0/NaN SE while the R fixture remains finite, the parity test will silently skip that cell instead of failing, which weakens the main regression guard this PR is adding. Concrete fix: first assert matching finite/missing SE state for every committed (path, horizon) cell, then do the numeric pytest.approx(...) comparison.

… version Three findings from CI review round 2 (1 P3 + 2 P2): - P2: `POINT_RTOL` tightened from 0.025 (MIXED_POINT_RTOL) to 1e-9. Observed rtol on all 15 (path, horizon) cells is ~1e-11 across both scenarios, consistent with R's 10-digit JSON rounding. The prior 2.5% tolerance would silently accept ~6 orders of magnitude of point-estimate regression while the docs claim exact R parity. - P2: `_compare_by_path()` now asserts matching finite/positive state on `py_se` vs `r_se` BEFORE the `pytest.approx` numeric check. Previously the `if py_se > 0 and r_se > 0:` guard silently skipped SE parity when one side degraded to 0/NaN/inf while the other stayed finite; the regression now fails explicitly with a diagnostic citing both values and suggesting a variance-identifiability check. - P3: `DIDmultiplegtDYN` version pin moved from `>= 2.3.3` to `== 2.3.3` so a future release with changed `by_path` slot semantics cannot silently regenerate the fixture. Comment documents the update protocol: bump the pin AND re-run the parity class when moving to a newer known-compatible version, or extend to an explicit allowlist once a second version is verified. Tests still pass (2/2 in TestDCDHDynRParityByPath); ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-24T17:36:12Z

/ai-review

github-actions · 2026-04-24T17:40:05Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 23259ed3657827f01f7ff9fde70559814c92a526

Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings. Re-review scope here is the by_path parity artifact work (CHANGELOG, R fixture generation, golden data, registry note, and parity tests); no estimator/runtime code changed in this PR.

Executive Summary

Affected method in scope: ChaisemartinDHaultfoeuille.by_path; the implementation contract in diff_diff/chaisemartin_dhaultfoeuille.py:L398-L429 and diff_diff/chaisemartin_dhaultfoeuille.py:L4959-L5139 matches the updated registry note.
The only methodology deviation I found is the cross-path cohort-sharing SE difference versus R, and it is now explicitly documented as a Deviation from R in docs/methodology/REGISTRY.md:L641-L641, so it is mitigated under the review rubric.
The two issues from the prior AI review are addressed: point parity is now tightened to 1e-9, and the SE check now fails on finite/missing-state mismatch instead of silently skipping cells (tests/test_chaisemartin_dhaultfoeuille_parity.py:L495-L592).
One new P2 test-harness gap remains: the new top-k scenario still does not assert frequency_rank, even though scenario 14 was designed with unique path counts specifically to make ranking deterministic.
One P3 maintainability issue remains in the R generator: scenario 14 comments/params still imply a stochastic p_switch-driven DGP, but the new branch is deterministic and does not read p_switch.
I could not run the suite here because this environment lacks pytest, and importing the estimator path also fails due missing pandas.

Methodology

P3 informational — Affected method: ChaisemartinDHaultfoeuille.by_path. The cross-path cohort-sharing SE difference versus R is now explicitly documented in docs/methodology/REGISTRY.md:L641-L641, and that note is consistent with the in-code by_path contract in diff_diff/chaisemartin_dhaultfoeuille.py:L416-L424 and diff_diff/chaisemartin_dhaultfoeuille.py:L5092-L5130. Impact: none for merge assessment; this is a documented deviation, not an undocumented methodology defect. Concrete fix: none.

Code Quality

No findings.

Performance

No findings.

Maintainability

P3 — Scenario 14’s generator comments and recorded params still describe a stochastic, p_switch-driven design, but the new multi_path_reversible branch is fully deterministic and never reads p_switch (benchmarks/R/generate_dcdh_dynr_test_values.R:L113-L181, benchmarks/R/generate_dcdh_dynr_test_values.R:L620-L638). Impact: fixture provenance is misleading; future regenerations may incorrectly assume p_switch changes the scenario or that random post-switch toggling is still being exercised. Concrete fix: either remove p_switch from the scenario call/recorded params and rewrite the scenario 14 comments to describe deterministic path assignment, or actually wire p_switch into the DGP and update the docs/changelog to match.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

P2 — The new parity harness still does not assert the public frequency_rank contract, even though scenario 14 was explicitly constructed with unique path frequencies to make top-k ranking unambiguous (tests/test_chaisemartin_dhaultfoeuille_parity.py:L545-L592; generator rationale at benchmarks/R/generate_dcdh_dynr_test_values.R:L144-L157). Impact: a regression in path ranking/order would silently pass as long as the selected path set and per-path effects remain correct, weakening the main regression guard for the “top-k ranking case” this PR adds. Concrete fix: assert py_path["frequency_rank"] == r_path_entry["frequency_rank"] at least for multi_path_reversible_by_path, and ideally also assert that results.to_dataframe(level="by_path") is ordered by those ranks.
Previous test-harness findings are resolved: exact-ish point-parity enforcement and SE-state mismatch enforcement are both now present in tests/test_chaisemartin_dhaultfoeuille_parity.py:L495-L592.
Execution note: not run locally because pytest and pandas are unavailable in this environment.

Wave 1 of the by_path follow-up sequence. Locks in the per-path SE convention against R DIDmultiplegtDYN 2.3.3 before downstream waves (inference completion, design variants, survey) build on it. Extends benchmarks/R/generate_dcdh_dynr_test_values.R with an extract_dcdh_by_path helper and two scenarios: - mixed_single_switch_by_path: 2 observed paths, by_path=2 - multi_path_reversible_by_path: 4 observed paths, by_path=3 The multi_path_reversible DGP is deterministic: path assignment is a function of F_g so each (D_{g,1}, F_g, S_g) cohort contains switchers from a single path. This keeps the cohort-recentered IF comparable to R's per-path re-run; cross-path cohort sharing is the documented SE divergence mechanism. New TestDCDHDynRParityByPath in tests/test_chaisemartin_dhaultfoeuille_parity.py matches paths by tuple label via set-equality (robust to R's undocumented frequency-tie tiebreak) and hard-asserts per-path switcher counts before SE comparison. Observed parity: point estimates and switcher counts match R exactly; per-path SE within 12% rtol (Phase 2 envelope) — scenario 13 max 10.2%, scenario 14 max 4.2%. REGISTRY.md Note (Phase 3 by_path...) updated with R-parity confirmation and a **Deviation from R (cross-path cohort-sharing SE)** bullet describing the mechanism under which full-panel cohort-centered plug-in (ours) and per-path re-run (R's) diverge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… version Three findings from CI review round 2 (1 P3 + 2 P2): - P2: `POINT_RTOL` tightened from 0.025 (MIXED_POINT_RTOL) to 1e-9. Observed rtol on all 15 (path, horizon) cells is ~1e-11 across both scenarios, consistent with R's 10-digit JSON rounding. The prior 2.5% tolerance would silently accept ~6 orders of magnitude of point-estimate regression while the docs claim exact R parity. - P2: `_compare_by_path()` now asserts matching finite/positive state on `py_se` vs `r_se` BEFORE the `pytest.approx` numeric check. Previously the `if py_se > 0 and r_se > 0:` guard silently skipped SE parity when one side degraded to 0/NaN/inf while the other stayed finite; the regression now fails explicitly with a diagnostic citing both values and suggesting a variance-identifiability check. - P3: `DIDmultiplegtDYN` version pin moved from `>= 2.3.3` to `== 2.3.3` so a future release with changed `by_path` slot semantics cannot silently regenerate the fixture. Comment documents the update protocol: bump the pin AND re-run the parity class when moving to a newer known-compatible version, or extend to an explicit allowlist once a second version is verified. Tests still pass (2/2 in TestDCDHDynRParityByPath); ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…4 doc cleanup Two findings (1 P2 + 1 P3): - P2: `_compare_by_path` now asserts `py_path["frequency_rank"] == r_path_entry["frequency_rank"]` for every committed path. Both scenarios are constructed with unique path frequencies (scenario 13 via the mixed_single_switch pattern, scenario 14 via deterministic counts 40/25/10/5), so rank ordering is unambiguous and any regression in top-k tiebreak handling now fails explicitly instead of passing silently as long as the selected path set and per-path effects remain correct. - P3: scenario 14 generator docstring and recorded params still described the old stochastic `p_switch`-driven DGP (the pre-PR variant that blew out SE parity via cross-path cohort mixing). The `multi_path_reversible` pattern is now DETERMINISTIC: path assignment is a fixed function of F_g with counts 20/20/15/10/10/5 across the 6 F_g values. `p_switch = 0.35` dropped from both the scenario call and the `params` block in the fixture; comment block rewritten to describe the deterministic design and cite the REGISTRY note for the rationale behind the design choice. Fixture regenerated; scenario 14 params no longer carry the stale `p_switch` entry. Point and SE parity numbers unchanged (deterministic DGP produces the same treatment matrix as before). Tests pass (2/2); ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-24T17:49:28Z

/ai-review

github-actions · 2026-04-24T17:53:06Z

🔁 AI review rerun (requested by @igerber)

Head SHA: daefda78e347896ba56562f30935315bac3c5c95

Overall Assessment

✅ Looks good

No unmitigated P0/P1 findings. Re-review scope here is the new by_path parity artifact work; the estimator/runtime implementation for ChaisemartinDHaultfoeuille.by_path is unchanged in this PR.

Executive Summary

Affected method in scope: ChaisemartinDHaultfoeuille.by_path. The updated registry/changelog language matches the existing in-code by_path contract in diff_diff/chaisemartin_dhaultfoeuille.py:L398-L429 and diff_diff/chaisemartin_dhaultfoeuille.py:L4959-L5139.
The only methodology deviation in scope is the cross-path cohort-sharing SE difference versus R, and it is now explicitly documented as a Deviation from R in docs/methodology/REGISTRY.md:L641-L641, so it is mitigated under the review rubric.
The prior AI-review issues are addressed: the parity harness now asserts frequency_rank and finite/missing SE-state consistency in tests/test_chaisemartin_dhaultfoeuille_parity.py:L549-L606.
The R generator/comments are now internally consistent about the deterministic multi_path_reversible DGP, and fixture generation is pinned to exact DIDmultiplegtDYN 2.3.3 in benchmarks/R/generate_dcdh_dynr_test_values.R:L32-L40 and benchmarks/R/generate_dcdh_dynr_test_values.R:L113-L181, benchmarks/R/generate_dcdh_dynr_test_values.R:L620-L644.
I could not run the suite locally because this environment is missing numpy, pandas, and pytest.

Methodology

P3 informational — The cross-path cohort-sharing SE divergence versus R is explicitly documented in docs/methodology/REGISTRY.md:L641-L641, and that note is consistent with the existing by_path implementation contract in diff_diff/chaisemartin_dhaultfoeuille.py:L5092-L5130. Impact: none for merge assessment; this is a documented deviation, not an undocumented methodology defect. Concrete fix: none.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

P3 informational — The nearby mixed-direction SE parity limitation remains tracked in TODO.md:L61-L61 and is not worsened by this PR’s per-path fixtures/tests. Impact: none for merge assessment. Concrete fix: none in this PR.

Security

No findings.

Documentation/Tests

No findings. The prior test-harness gap on frequency_rank is closed in tests/test_chaisemartin_dhaultfoeuille_parity.py:L549-L560, and the earlier SE-state blind spot is closed in tests/test_chaisemartin_dhaultfoeuille_parity.py:L585-L606.

When `n_bootstrap > 0` is set with `by_path=k`, per-path joint sup-t simultaneous confidence bands are now computed across horizons `1..L_max` within each path. A single shared `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all valid horizons, producing correlated bootstrap distributions. The path-specific critical value `c_p = quantile(max_l |t_l|, 1-α)` is applied per horizon as `cband_conf_int = (eff - c_p·se, eff + c_p·se)` and surfaced at top level as `results.path_sup_t_bands[path]`. Closes Wave 2 #4 of the by_path follow-up sequence (#357 foundation, #360 R-parity, #364 bootstrap, #371 placebos). **Methodology asymmetry vs OVERALL** (intentional, documented): per-path sup-t draws fresh shared weights AFTER the per-path SE bootstrap block has populated `path_ses` via independent per-(path, horizon) draws. Asymptotically equivalent to OVERALL's self-consistent reuse but NOT bit-identical. Preserves RNG-state isolation for existing per-path SE seed-reproducibility tests. **Gates** mirror OVERALL: `>=2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band. Otherwise the path is absent from `path_sup_t_bands`. **Empty-state contract**: `path_sup_t_bands is None` when not requested (no bootstrap or `by_path is None`); `{}` when requested but no path passes both gates (covers two cases: `path_effects == {}` upstream OR all paths fail gates downstream). **Deviation from R**: `did_multiplegt_dyn` provides no joint / sup-t bands at any surface — Python-only methodology extension consistent with the existing OVERALL `event_study_sup_t_bands` (also Python-only). Inherits the cross-path cohort-sharing SE deviation from R documented for `path_effects`. **Bundled pre-audit fix** (sibling-surface check): the existing OVERALL `sup_t_bands` field's stale "Phase 2 placeholder" docstring updated to the actual contract description. Tests: new `TestByPathSupTBands` class with 13 tests covering: attr None when no bootstrap / no by_path; keys match `path_effects` with finite crit; band wider than pointwise; crit finite and positive; seed reproducibility; single-horizon-path-skip; L_max=1 skip; n_valid_horizons matches; absent-path-no-cband-keys; summary renders; empty-dict-when-no-complete-window; strict-majority-gate-at-exact-50pct (monkeypatches the weight generator to inject NaN into half the bootstrap rows, asserting both `sup_t_bands is None` and `path_sup_t_bands == {}` at the boundary). All `@pytest.mark.slow`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber force-pushed the dcdh-by-path-rparity branch from ca4df9d to b40ca7b Compare April 24, 2026 17:13

igerber and others added 3 commits April 24, 2026 13:47

igerber force-pushed the dcdh-by-path-rparity branch from 23259ed to daefda7 Compare April 24, 2026 17:49

igerber added the ready-for-ci Triggers CI test workflows label Apr 24, 2026

igerber merged commit 7a59b8c into main Apr 24, 2026
21 of 22 checks passed

igerber deleted the dcdh-by-path-rparity branch April 24, 2026 19:39

This was referenced Apr 25, 2026

Release 3.3.0: HAD estimator, profile_panel, dCDH by_path, SDID survey complete #368

Merged

Add per-path joint sup-t bands to ChaisemartinDHaultfoeuille.by_path #374

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dCDH by_path R-parity fixtures + TestDCDHDynRParityByPath#360

dCDH by_path R-parity fixtures + TestDCDHDynRParityByPath#360
igerber merged 3 commits intomainfrom
dcdh-by-path-rparity

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Apr 24, 2026

Summary

Observed parity

Documented deviation

R script additions

Wave queue updated

Test plan

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

igerber commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant